Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1022420210130030065
Phonetics and Speech Sciences
2021 Volume.13 No. 3 p.65 ~ p.70
Designing a large recording script for open-domain English speech synthesis
Kim Sun-Hee

Kim Ho-Jeong
Lee Yoo-Seop
Kim ?Bo-Ryoung
Won Yong-Kook
Kim Bong-Wan
Abstract
This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.
KEYWORD
recording script, speech synthesis, English, word coverage, phonetic coverage, readability
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)